Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French
نویسندگان
چکیده
This paper presents the first probabilistic parsing results for French, using the recently released French Treebank. We start with an unlexicalized PCFG as a baseline model, which is enriched to the level of Collins’ Model 2 by adding lexicalization and subcategorization. The lexicalized sister-head model and a bigram model are also tested, to deal with the flatness of the French Treebank. The bigram model achieves the best performance: 81% constituency F-score and 84% dependency accuracy. All lexicalized models outperform the unlexicalized baseline, consistent with probabilistic parsing results for English, but contrary to results for German, where lexicalization has only a limited effect on parsing performance.
منابع مشابه
Cross parser evaluation : a French Treebanks study
This paper presents preliminary investigations on the statistical parsing of French by bringing a complete evaluation on French data of the main probabilistic lexicalized and unlexicalized parsers first designed on the Penn Treebank. We adapted the parsers on the two existing treebanks of French (Abeillé et al., 2003; Schluter and van Genabith, 2007). To our knowledge, mostly all of the results...
متن کاملCross Parser Evaluation and Tagset Variation : a French Treebank Study
This paper presents preliminary investigations on the statistical parsing of French by bringing a complete evaluation on French data of the main probabilistic lexicalized and unlexicalized parsers first designed on the Penn Treebank. We adapted the parsers on the two existing treebanks of French (Abeillé et al., 2003; Schluter and van Genabith, 2007). To our knowledge, mostly all of the results...
متن کاملLexicalization of Probabilistic Grammars
Two general methods for the lexicalization of probabilistic grammars are presented which are modular, powerful and require only a small number of parameters. The rst method multiplies the unlexicalized parse tree probability with the exponential of the mutual information terms of all word-governor pairs in the parse. The second lexicalization method accounts for the dependencies between the dii...
متن کاملGraphes paramétrés et outils de lexicalisation
Shifting to a lexicalized grammar reduces the number of parsing errors and improves application results. However, such an operation affects a syntactic parser in all its aspects. One of our research objectives is to design a realistic model for grammar lexicalization. We carried out experiments for which we used a grammar with very simple content and formalism, and a very informative syntactic ...
متن کاملUsing subcategorization frames to improve French probabilistic parsing
This article introduces results about probabilistic parsing enhanced with a word clustering approach based on a French syntactic lexicon, the Lefff (Sagot, 2010). We show that by applying this clustering method on verbs and adjectives of the French Treebank (Abeillé et al., 2003), we obtain accurate performances on French with a parser based on a Probabilistic ContextFree Grammar (Petrov et al....
متن کامل